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Topological centrality is a significant measure for characterising the relative importance of a node in a 
complex network. For directed networks that model dynamic processes, however, it is of more practical 
importance to quantify a vertex's ability to dominate (control or observe) the state of other vertices. In this 
paper, based on the determination of controllable and observable subspaces under the global minimum-cost 
condition, we introduce a novel direction-specific index, domination centrality, to assess the intervention 
capabilities of vertices in a directed network. Statistical studies demonstrate that the domination centrality 
is, to a great extent, encoded by the underlying network's degree distribution and that most network 
positions through which one can intervene in a system are vertices with high domination centrality rather 
than network hubs. To analyse the interaction and functional dependence between vertices when they are 
used to dominate a network, we define the domination similarity and detect significant functional modules 
in glossary and metabolic networks through clustering analysis. The experimental results provide strong 
evidence that our indices are effective and practical in accurately depicting the structure of directed 
networks. 



Studies of the structure and function of complex networks can yield a variety of useful quantities or measures 
that capture particular features of social, biological and information-technology systems 1 . In this context, 
the concept of centrality addresses the most important or central vertices in a network. Despite the diversity 
of systems, several basic, universal measures of centrality have been developed to rank the vertices of a network 
according to their topological importance, including the vertex degree, betweenness 2 ' 3 , closeness 4 , eigenvector 5 , 
subgraph 6 , PageRank 7 and various types of random walks 8 ' 9 . Although these measures have significantly enriched 
our understanding of many networks, our ultimate goal is to locate the most significant vertices that have the 
ability to dominate the networks. 

Although the actual domination of complex networks has not yet been achieved at present, a necessary 
stepping stone is to understand the controllability and observability of complex networks, which has become a 
topic of active pursuit 38 42 . Based on control theory, Liu et al. 10 have proposed an efficient methodology for 
identifying the minimum driver vertex set (MDS), the time- dependent control of which can guide the entire 
network to any desired final state 1112 . The number of elements of the MDS, N D , is thus a key quantity of interest, as 
it characterises the cost of bringing the system under full control. The proposed maximum matching link set M 
can be used to assess and quantify structural controllability. On the other hand, a network is observable if its 
internal state can be determined from the given output vertex set, where observability depends on both the 
number and placement of the output vertices 13 . Liu et al. 14 have adopted a graphical approach to determining the 
set of output vertices that are not only necessary but also sufficient for the observability of a complex network. 
Specifically, given a complex-networked dynamical system, the controllable subspace reflects the control cap- 
ability of a vertex when we input a signal at that single vertex only, and the observable subspace reflects the 
observation capability of a vertex when we measure the output from that single vertex only. Recently, Liu et al. 15 
and Wang et al. 16 have further introduced the concept of control centrality to quantify the ability of a single vertex 
to control a directed weighted network. However, the use of only the control capability to quantify the vertex 
centrality is not comprehensive, as a vertex may directly intervene only in its downstream subspace from the 
viewpoint of controllability. For example, in figure 1, by inputting a signal at the vertex /, the state variables in the 
downstream system S 2 can be controlled. This is the manner in which a vertex controls its downstream system, 
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Figure 1 | A schematic diagram illustrating the physical meaning of 
domination capability. By inputting a signal at the vertex i y the state 
variables in the downstream system S 2 can be controlled. By observing the 
state variables by measuring the state of vertex i, a feedback loop can be 
constructed to control the upstream system S h 

but this process embodies only one aspect of a vertex's power in 
dominating a system. If the state variables are looped back, the feed- 
back signal can then control a system within itself. State feedback is 
self-related and helps to maintain stability in a system despite 
external changes. However, state feedback can be established only 
on the condition that all state variables are measurable. If not, the 
state variables must be estimated by utilising a state observer. In 
figure 1, the state observer obtains the state variables of the upstream 
system Sj by measuring the state of vertex /; the feedback loop can 
then be constructed to control Sj. In this manner, a vertex can control 
its upstream system through feedback, and this process reflects 
another aspect of a vertex's power in dominating a system. 
Therefore, the ability to examine the role that a vertex plays in both 
controlling the downstream subspace and observing the upstream 
subspace is an issue of significant practical interest in vertex 
centrality. 

In this paper, we focus on the domination centrality (DC) index to 
assess the capabilities of vertices in directed networks. Intuitively, 
domination centrality includes two aspects: control capability and 
observation capability. Under the minimum-control-cost condition, 
for a single vertex, the control capability captures the dimension of 
the controllable subspace and quantifies the influence that can be 
exerted on the downstream subnetwork through this vertex. 
Similarly, the observation capability captures the dimension of the 
observable subspace and quantifies the intervention that can be 
exerted on the upstream subnetwork through this vertex. The pur- 
pose of emphasising the global minimum cost is to determine the 
responsibility and capability of each individual vertex cooperating 
with others in dominating the entire system. Mathematically, dom- 
ination centrality is the harmonic mean of these two capabilities and 
represents the capability of a vertex synthetically. This approach is in 
good agreement with our original notion regarding the "power" of a 
vertex in dominating the entire network. Inspired by this general 
consideration, we perform statistical studies of the index DC for 
several types of real-world directed networks, including citation, 
metabolic, glossary and synthetic networks, and analyse the under- 
lying topological factors by which the distribution of DC is primarily 
determined. To uncover DC and functions of vertices, a clustering 
analysis is presented based on the intuitive assumption that vertices 
that control and observe the same subspaces tend to serve identical 
functions in a network. Our domination centrality index bridges the 
concepts of directed network topology and function by providing 
useful insights into the effect of the former on the latter from the 
viewpoint of cybernetics. 

Results 

Domination centrality. Consider the linear time-invariant dynamic 
system X(t) =A-X(t)+B-u(t), Y(t) = OX(t) with the state vector 
XeU n , the adjacency matrix AeU n x U n , the input matrix 
BeU n x R m , the control vector ueM m , the output matrix 
CeW x U n and the output vector YeU r . The underlying directed 
network of this system is denoted by G(A), with vertex set V and 



link set L. The rank of the nX nm controllability matrix Q c = [B, AB, 
A 2 B,..., A n ~ l B], which is denoted by rank(Q c ), provides the 
dimension of the controllable subspace of the structural system (A, 
B, C) 1718 . (A, B, C) is completely controllable 12 iff rank (C c ) = n. 
Analogously, the rank of the nrXn observability matrix Q Q = [ [C] T , 
[CA]\ [CA 2 ] r , [CA^ff, which is denoted by rank(Q G ), 
provides the dimension of the observable subspace of this system. 
(A, 5, C) is completely observable iff rank (Q Q ) = n. Furthermore, the 
duality theorem 12 indicates that system (A, 5, C) is completely 
controllable if and only if system (A T , C T , B T ) is completely 
observable, and vice versa. 

Liu's Minimum Input Theorem 10 states that the minimum num- 
ber of driver vertices (N D ) required to fully control a network G(A) is 
one if there is a perfect matching in G(A). Otherwise, it is equal to the 
number of unmatched vertices with respect to any maximum match- 
ing, N D = maxjrc — |M| ,1}. A maximum matching is a link set M c L 
with maximum cardinality (size), and no two links in M may share a 
common starting vertex or a common ending vertex. A vertex is 
matched if it is an ending vertex of a link in M. |M| denotes the size 
of the maximum matching. 

For a given maximum matching link set M of a directed network 
G(A), the minimum-control-cost configuration CF(V,M\JAL) car- 
ries the structural information of completely control 16 . CF is a span- 
ning subnetwork of G(A), with vertex set V and link set M\JAL c= L. 
M is a stem-cycle disjoint cover of G(A) and indicates the directed 
routes along which the input control signals are transmitted. AL is 
the set of additional links that begin in vertices of stems (except the 
top vertices) and end in vertices of cycles. The n X n adjacent matrix 
A(M) is used to indicate the wiring diagram of the spanning subnet- 
work CF that corresponds to the maximum matching link set M of 
G(A). As an example, in figure 2(a), the red links are elements of a 
maximum matching. When vertices are connected by red links, the 
network thus constructed is composed of vertex- disjoint stems (two 
in shades of green) and cycles (four in shades of red); l 3J and l 4j9 are 
the additional links that connect the stems and cycles. 

The controllability of a complex network concentrates on the 
interaction structure in which the pattern of influence may be 
known, but not the specific extent of influence. In response to 
unknown or uncertain edge weights, the controllability is used to 
uncover the generic properties of systems, independent of parameter 
values. The cactus is the most economical topology-structure pattern 
to propagate control influence, since the cactus is a minimal structure 
such that removing any link will render the structure uncontrollable. 
A maximum matching shows the important links by which we can 
construct the cactus structures efficiently in a complex system. 
Therefore, the maximum matching not only reveals the minimum 
driver set but also consists of a backbone of the key control routes, 
which are a stem-cycle cover of the original network. The minimum- 
control -cost configuration CF is just constructed for showing the 
backbone of the propagation of control influence. 

To quantify the control capability of a single vertex i under the 
minimum-control-cost condition, B reduces to the vector b (i) with a 
single non-zero entry, and A reduces to the matrix A(M). Then, the 
control capability of a single vertex i can be defined as 



(i) 



Lin's theorem 11 has demonstrated that a linear control system (A, B) 
is structurally controllable if and only if the associated digraph G(A) 
can be spanned by cacti. A cactus is a subnetwork in the form of a 
distinct stem or a stem connected to several buds. A stem is simply an 
elementary path that originates from an input vertex. The initial (or 
terminal) vertex of a stem is known as the root (or top) of the stem. A 
bud is an elementary cycle with an additional link that ends, but does 



rank(Q c (M) ) = rank ]b® ,A (AT) b {i) , 

(A(M)) 2 ^,...,(A(M)) n ~ 1 ^j. 



SCIENTIFIC REPORTS | 4:5399 | DOI: 1 0.1 038/srep05399 



2 




Controllable subspace J CS * ( M i ) CS 2 (M 2 ) rank(& (M l )) = \0 rank (Q 2 C (M 2 )) = 7 

Observable subspace ( JOS 1 [Ml) ) OS 2 (m t 2 ) rank($,{M T x ')) = 7 /rwA' (o£ (M[ )) = 10 



2 

Domination centrality DC 1 = j)C l = 2/(1/10 + 1/7) % 8.2 

r^(^(M)) + r^/:(e;(M r )) 




CS' = (J CS 1 (M, ) 05' = U OS' (Ml ) 



L„ cs- 

") OS 1 OS 

JC ( i, j ) = Jaccavd (CS\ CS J ) 

JO(ij) = Jaccavd (OS\OS J ) 

Domination DS (ij)= agm(jC(i,j),JO(ij)) 
similarity v 7/ 

Figure 2 | A schematic diagram illustrating the domination centrality and the domination similarity of vertices in a directed network, (a) : A maximum 
matching M x consisting of red links, forms a stem (in shades of green) -cycle (in shades of red) disjoint cover of the network. Additional links (AC) that 
connect the stems and cycles are highlighted by bold dashed lines, (b): The controllable subspace of vertex 1 is highlighted by a purple dotted line, and its 
observable subspace is highlighted by a green dotted line. The domination centrality of vertex 1 is the harmonic mean of the size of these two subspaces. 
(c): Another maximum matching, M 2 , is given, (d): The controllable subspace of vertex 2 is highlighted by an orange dotted line, and its observable 
subspace is highlighted by a blue dotted line, (e): The overlapping phenomenon of the controllable subspaces and the observable subspaces is depicted. 
The Jaccard similarity coefficients of the controllable subspaces and the observable subspaces are calculated, and the arithmetic-geometric mean thereof is 
used to determine the domination similarity. 



not begin, in a vertex of the cycle, and the top vertex of the stem is not 
the initial vertex of any additional link. The network can be spanned 
by cacti using links of A(M). Thus, A(M) demonstrates the manner in 
which vertices control the entire network under the minimum-con- 
trol-cost condition. When the vertex i is taken as an input vertex, the 
subspace that is accessible from vertex i in the spanning subnetwork 
CF is cactus -structured and structurally controllable. A vertex j is 
called accessible if there is at least one directed path that passes from 
the input vertex i to vertex;'. For example, in figure 2(b), the access- 
ible subspace of vertex 1 is highlighted in bold purple and spanned by 
the links of this CF in the form of a cactus. 

We can therefore use the size of the accessible subspace of vertex i 
as an accurate measure of rank(Q c (M)y Thus, equation (1) can be 
represented by 

rank(Q i c (M)) = \CS i (M)l (2) 



where 

CSUM) = { j\j is accessible from vertice i 

(3) 

in CF(V,(M{JAL))} 

is the set of vertices in the controllable subspace of vertex /. 

By invoking the duality between controllability and observability 
in a linear system, it can be seen that the driver vertices in network 
G(A) for inputting signals are simply the output vertices for mea- 
surement in the transposed network G(A T ), which is obtained by 
inverting the direction of all links. The network G(A T ) is guaranteed 
to be observable by monitoring those output vertices. Thus, all our 
controllability conditions can be readily extended to the observability 
case. The link set M T that is obtained by inverting the direction of all 
links in M forms a maximum matching link set of G(A T ). Thus, the 
minimum-observation-cost configuration (OF) can be defined as 
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OF(V,M T UAL'), where AV is the set of additional links in G(A T ), 
and A T {M T ) can be used to indicate the wiring diagram of OF that 
corresponds to M T in GiA 7 ). As an example, in figure 2(a), l 12 ,is is the 
only additional link in AV . 

To quantify the observation capability of a single vertex i under 
the minimum-observation-cost condition, the output matrix B T 
reduces to the vector (B {i) ) T with a single non-zero entry, and A T 
reduces to the matrix A T (M T ). Then, the observation capability can 
be represented by the size of the observable subspace OS'(M r ) of 
vertex i in OF(y,M T \JAV) and can be accurately measured as 
follows: 



A 1 M 1 



where 



rank(Q 0 (M T ))=rank (b^ ,(b^ 

(^)V (M r)) 2 , 

(b^ T (A T (M T )) nl 

rank(Q i 0 (M T )) = \OS i (M T )l 

OS 1 (M T ) = {j\j is accessible from vertice i 
in OF(V,(M T UAL'))}. 



(4) 



(6) 



Considering the role that a vertex plays in both controlling the 
downstream subspace and observing the upstream subspace, the 
domination centrality (DC) index for the assessment of the cap- 
abilities of vertices in directed networks can be synthetically 
defined as the harmonic mean of a vertex's control capability 
and observation capability. The domination centrality of vertex i 
is represented by 

DC = 2 ; . (7) 

rank(Q c {M)) rank(Q Q (M T )) 

The DC index is used to detect the most powerful vertex through 
which we can not only control but also observe a network. 
Therefore, as the harmonic mean of the control capability and 
observation capability of the vertex, DC will be significant only 
when the control capability and observation capability attain high 
values simultaneously. In figure 2(b), for the given maximum 
matching M l5 the controllable subspace of vertex 2, CS l (Mi), is 
highlighted by a purple dotted line and has rank^Q^Mi)) = 10, 
and the observable subspace, OS 1 (Mf), is highlighted by a green 
dotted line and has raw/^Qj^M^)) = 7; thus, the domination cent- 
rality of vertex 2 is DC 1 = 2/(1/10 + 1/7) ~ 8.2, and vertex 2 is 
powerful in dominating the network. By contrast, vertex 14 has the 
highest value of control capability but a very small observation 
capability, meaning that DC 14 = 2/(1/13 + 1/1) « 1.9. Thus, 
vertex 2 has a stronger overall ability to dominate the network 
than does vertex 14. In the worst case, when a vertex i can only 
control and observe itself, DC = 1. 

Furthermore, we note that there are multiple different maximum 
matchings (N\ matchings for a complete connected network). Each 
one illustrates a unique manner in which vertices may control and 
observe the entire network under a minimum-cost condition. 
Therefore, in combination with other vertices, a vertex may play 
several different roles in dominating a network. Thus, we may ask 
this question: in all possible minimum-cost configurations, if two 



vertices can perform similar control and observation functions, does 
that fact indicate that they can also play similar functions in inter- 
vening in the system? To answer this question, the domination sim- 
ilarity (DS) is defined as 



DS(i,j)=agm(JC(hj)JOm, 



(8) 



where agm(x, y) is the arithmetic-geometric mean 29 of two positive 
real numbers x and y. We calculate the Jaccard similarity coefficient 
of the complete controllable subspaces of i and j to determine their 
control-function similarity. Meanwhile, the Jaccard similarity coeffi- 
cients of the complete observable subspaces of i and; are calculated to 
determine their observation-function similarity. The complete con- 
trollable subspace CS l = {J^ =l CS l (Mk) and the complete observable 
subspace OS 1 = Uf = i OS { (Mj) , where K is the number of different 
maximum matchings. JC(i, j) = Jaccard{CS\ GS 7 ), /0(/, j) = 
Jaccard(OS\ OS 7 ). agm(x, y) is a number between the geometric 
and arithmetic means of x andy; thus, DS(i,j) will be significant only 
when JC(i, j) and JO(i, j) attain high values simultaneously. 
Furthermore, in the case that there is a large difference between 
the two quantities, agm (x,y) yields a more reasonable result than 
the arithmetic or harmonic mean. 

Figure 2 vividly illustrates this concept. M x and M 2 are two differ- 
ent maximum matchings of this toy network, with links highlighted 
in red in figure 2(a) and figure 2(c), respectively. We concentrate on 
the domination capabilities of vertices 1 and 2. In figure 2(b), for 
vertex 2, the controllable subspace CS\Mi) in CF(A(MJ) is indi- 
cated in purple, and the observable subspace OS 1 (mH in 
OF (A T (M-l) ) is indicated in green. Similarly, the situation for vertex 
2 is illustrated in figure 2(d), with orange and blue colours corres- 
ponding to M 2 . A great deal of information regarding the functions of 
these two vertices can be determined based on the overlapping of 
their controllable and observable subspaces, as shown in figure 2(e). 

Distribution of domination centrality. If a structural system can be 
shown to be controllable for almost all weight combinations 10 and the 
dimension of the controllable subspace is stable, in the sense that for 
almost any set of system parameters, the dimension is equal to some 
maximal constant (the generic rank of the controllability matrix) 19 , 
all these properties also hold for observability 14 . Thus, to some extent, 
domination centrality and domination similarity can be calculated 
without assessing the link weights. This property is one of the greatest 
advantages of controllability-based topological measures: they are 
robust to uncertainty in link weights, which frequently arises in 
networks constructed from real data, such as biological networks. 

In this section, we perform statistical studies of the domination 
centrality on several types of real-world directed networks, including 
citation, glossary, metabolic and synthetic scale-free networks, as 
summarised in table 1. We have manually reconstructed the global 
human enzyme- centric network based on data available in the 
August 2009 release of the Kyoto Encyclopedia of Genes and 
Genomes (KEGG) 20 . The citation and glossary networks are drawn 
from Pajek datasets and can be downloaded at http://vlado.fmf.uni- 
lj. si/pub/networks/data/. The synthetic scale-free networks were 
constructed using the method of Fan et al. 21 . In table 1, we provide 
the statistical values of the numbers of vertices («), links (m) and 
minimum driver vertices (N D ) for the original networks. 

We first consider the distribution of the domination centrality. For 
a given network, any existing algorithm 22 ' 23 can be used to compute a 
maximum matching M. For this M, the domination centrality reveals 
the responsibility and capability of each individual vertex in control- 
ling and observing the system with the global minimum cost. Figure 3 
presents the distribution of the domination centrality for the syn- 
thetic scale-free networks listed in table 1. In double-logarithmic 
coordinates, the relation between the DC value and the probability 
P(DC) is nearly linear, suggesting the coexistence of a few powerful 
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Table 1 | Summarized statistics for the original representative networks 



Type 



Name 



n 



m 



N D 



Glossary 
Citation 



Metabolic(enzyme-centric) 

Synthetic Scale Free {k jn ) = {k ouf ) = 3, P(k in ) ~k in ~ y , 
P{k-out)~kout ^ 



GlossTG 
SmaGri 
SciMet 
Kohonen 
Homo sapiens 
SFy = 2.1 
SFy = 2.4 
SFy = 3 
SFy = 4 



67 
1024 
2729 
3772 
689 
5000 
5000 
5000 
5000 



122 
4918 
10412 
12729 
2382 
14972 
14989 
14996 
14997 



32 
51 1 
1 156 
2115 
149 
2059 
1583 
1007 
532 



vertices and a large number of vertices that have little domination 
over the system's dynamics. However, we also must consider that 
even the most powerful vertex can dominate only a small local sub- 
space within the entire system, and thus, it is preferable to identify 
multiple collaborating vertices for the domination of the entire sys- 
tem. Therefore, cooperative relations among vertices are of signifi- 
cant concern. This is why we insist on measuring all vertices' 
capabilities in collaboratively dominating an entire network in a 
certain minimum- control- cost configuration. 

To statistically explain which topological features determine the 
distribution of the domination centrality itself, we compare the DCs 
of each vertex in the real networks and their randomised counter- 
parts (denoted as rand-ER and rand-Degree). A full randomisation 
procedure (rand-ER) turns the network into a fairly homogeneous, 
directed Erdos-Renyi random network 24 . The domination centrality 
values in rand-ER (DC ER ) and the corresponding number of driver 
vertices (N^ R ) change dramatically, as shown in table 2. For almost all 
networks, there is no correlation between DC and DC ER , indicating 
that full randomisation eliminates the topological characteristics that 
influence domination centrality. We also apply a degree-preserving 
randomisation (rand-Degree) 25 , which leaves the in-degree, k in , and 
the out-degree, k outi of each vertex unchanged but randomly selects 
which vertices link to each other. We find that this procedure does 
not significantly alter the number of driver vertices (N^ egree ) or the 
domination centrality (DC Dgeree ). For example, in figure 4(b, c, d), we 
present scatter plots of the DC values versus the in-degree, out-degree 
and degree in the Homo sapiens networks; the results for the real 
networks (green) are reasonably consistent with that for the rand- 
Degree counterparts (blue), whereas the results for the rand-ER 
counterparts (purple) are significantly different from the others. 



In addition, we calculate the mean, the average of absolute devi- 
ation 26 and the relative entropy 27 for the distribution of the domina- 
tion centrality in each real network and their random counterparts in 
table 3. Compared to the real networks, the rand-Degree counter- 
parts yield similar mean values, similar averages of absolute deviation 
and small relative entropies. The same indices of the rand-ER coun- 
terparts differ significantly in comparison. From all these observa- 
tions, we conclude that domination centrality is, to a great extent, 
encoded by the degree distribution of the underlying network. 

Another interesting phenomenon observed in this study is that the 
hubs (vertices of high degree) do not tend to play more important 
roles in dominating a system. We divide the vertices into three 
groups of equal size according to their degree k (low, medium and 
high) and calculate the average values of DC among the low- degree, 
medium-degree and high-degree vertices. As table 2 demonstrates, 
for real networks and two random network models (Erdos-Renyi 24 
and scale-free 21 " 25 ), the average DC value of the set of low-degree 
vertices is not significantly lower than that of the set of hubs in each 
case. Figure 4(a) graphically represents the values for the Homo 
sapiens networks. In figure 4(b, c, d), as expected, in all cases, a 
low-degree vertex can also have a significant domination centrality. 
For a vertex with a degree equal to 1, either the control capability or 
the observation capability must also be equal to 1; thus, as the har- 
monic mean of these two capabilities, the domination centrality must 
be less than 2. Intuitively, a vertex with a degree of 1 must have either 
no downstream space it can control or no upstream space it can 
observe. This is the reason why the hubs are observed to attain 
slightly larger DC values than the low- degree vertices. To conclude, 
this experimental study demonstrates that there is no obvious cor- 
relation between the degree and the DC. This result is very useful in 
the following sense: the most effective method by which we can 
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Figure 3 | The distribution of the domination centrality in double-logarithmic coordinates. The results for scale-free synthetic directed networks with 
N = 5000, (k in ) = (k out ) = (k)l2 = 3, P(k in ) ~ hn ~ y and P(k out ) ~ k out ~ y are shown. 
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Table 2 Summarized statistics for the domination centra lity values in representative networks 


Ntework 


N Degree ( cnange rafe ) 


Np R (change rate) 


DO 


£)£Dgeree a 


DC ERa 


\JlObb I \J 


OZ \\J.\J\J /oj 


1 0 H9 9>A°/A 

\\J\\JZ. .0*+ /oj 


1 ^1/1 A7 


1 9*5/1 74/1 


9 94/^ ^1/5 14 


SmaGri 


458 (5.18%) 


8(49.12%) 


1.19/1.68/1.95 


1.35/1.91/2.59 


41.04/42.72/44.92 


SciMet 


1075 (2.97%) 


81 (39.39%) 


1.84/1.34/1.76 


2.1 1/2.42/2.73 


20.08/23.52/23.50 


is L 

Kononen 


zOJV [2.0 1 /o) 


1 / O (51 .JO /o) 


1 . 1 y/ 1 .4 1/ 1 .oo 


i on /i ca / i oo 

1 .zl/1 .50/1 .88 


1 1 OA / I /( CO /I /( OZ. 

1 1 .zU/ 1 4.00/ 1 4.oo 


Homo sapiens 


174 (3.63%) 


24(18.14%) 


2.59/3.77/5.36 


2.32/3.94/4.84 


16.58/20.21/19.65 


SFy = 2.1 


2103 (0.88%) 


360 (24.46%) 


1.29/1.67/2.31 


1.26/1.61/2.25 


8.66/11.3/12.3 


SFy = 2.4 


1633 (1.00%) 




1.6/2.34/3.06 


1. 59/2. 27/2. 93 




SFy = 3 


982 (0.50%) 




2.44/3.71/4.18 


2.51/4.07/4.74 




SFy = 4 


517(0.30%) 




4.97/7.06/7.56 


4.95/6.80/7.33 




a The domination centrality 


average values of Low/Medium/High d 


agree vertices. 









intervene in a system's dynamics is to identify vertices with great 
domination capability, which are not restricted to hubs alone. 

Clustering analysis. In fact, a consensus among the topological 
criteria for measuring the functional similarity of vertices is often 
lacking in directed networks 28 . Ignoring the direction of the links 
may lead to partial or even misleading clustering results. Domina- 
tion similarity is a direction-specific index and concentrates on 
quantifying the unique relation between the upstream and down- 
stream subspaces of vertices in directed networks. Independent of the 
weights of the links used in the calculation, the domination similarity 
is a parameter-free index for analysing data with noise. With the 
global minimum-cost limitation, the domination similarity 
represents the ability of vertices to work synergistically with others 
and provides guidance for dominating a system using multiple 
vertices operating cooperatively at the minimum cost. In this 
section, we apply the DS index to detect and analyse functional 
modules in a glossary network and the enzyme-centric network of 
Homo sapiens. 

We utilise the DS values as the input of the AP algorithm 30 to 
identify the functional modules in the glossary network. In this case, 
we test the performance on a directed word network that has also 
been recently introduced by Newman 31 and Boccaletti 32 . The net- 
work represents the connections among a set of technical terms, such 
as "Tree" and "Digraph", contained in a glossary of network jargon. 
Vertices represent terms, and a directed link from one vertex to 
another exists in the network iff the second term is used to describe 
the meaning of the first term. Because circular definitions are unhelp- 
ful and are normally avoided, most links in the network are not 
reciprocal. The statistics for this network are provided in table 1. 



Figure 5 shows the modules identified in this network using our 
DS-based method. This method identifies nine modules in this case, 
which appear to correspond to the meaningful groups in understand- 
ing the relations among glossary terms. For instance, module 1, 
which is highlighted in red in the figures, deals with words that 
describe tree structure. Remarkably, as an upstream vertex, the term 
"Decision Tree" can be explained by its downstream terms in module 
1, and these downstream vertices constitute the controllable sub- 
space of the vertex "Decision Tree". As a downstream vertex, the 
term "Tree" is the basic foundation for the formation of other 
upstream terms, and these upstream vertices constitute the observ- 
able subspace of the vertex "Tree". Module 9 contains the glossary 
terms derived from the fundamental term "Digraph" and provides 
an overview of the dominance of this term. Additionally, all other 
detected modules represent not only groups of terms with similar 
meanings but also the etymology of the network jargon. Thus, the 
DS-based method appears to identify meaningful structure in the 
network, of a type that could be useful in understanding the broader 
shapes of otherwise poorly understood systems. 

We return now to the global human metabolic (enzyme-centric) 
network. The vertices in this network represent enzymes, and there is 
a directed link from one enzyme to another if the product of a 
reaction catalysed by the first enzyme is used as the substrate of a 
reaction catalysed by the second. The statistics for the human 
enzyme-centric network are provided in table 1. There are 689 
enzymes and 2382 directed links derived from 90 metabolic path- 
ways. Metabolism is a vital cellular process, and its malfunction is a 
major contributor to human disease 33 . Metabolic networks are com- 
plex, and thus, systems-level computational approaches are required 
to elucidate and understand them. Here, we wish to discuss the 




Low Medium High 
Degree 



10 20 30 
In-degree 



10 20 30 
Out-degree 



Degree 



Figure 4 | A schematic diagram illustrating the domination centrality in Homo sapiens networks, (a): The average values of domination centrality 
among low-, medium- and high-degree vertices. The scatter plots of the domination centrality versus the vertex in-degree, out-degree and degree are 
presented in panel (b), panel (c) and panel (d), respectively. The green, blue and purple plots represent the real network, rand-Degree network and rand- 
ER network, respectively. 
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Table 3 | Summarized statistics for the distribution of domination centrality 



Kltpwnrk 
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Avprnnc* Dpvintinn a 

/V V d UUC UCV IU 1 1 


Rp ntivp Fntrnnv 

ix^ii^i live i_ 1 1 1 1 v^k^ y 


GlossTG 


1.77/1.62/3.76 


0.69/0.46/1.61 


0.0386/0.5244 


SmaGri 


1.61/1.96/42.87 


0.46/0.87/18.80 


0.1047/3.2409 


SciMet 


1.73/1.93/22.37 


0.51/0.71/13.56 


0.043/2.7694 


Kohonen 


1.43/1.54/13.58 


0.31/0.43/8.21 


0.02/2.8511 


Homo sapiens 


3.96/3.736/18.826 


2.42/2.33/11.01 


0.0732/1.3426 


SFy = 2.4 


2.32/2.26/10.94 


1.02/0.98/7.18 


0.0049/1.5441 



a ln Origional/rand-Degree/rand-ER networks. 
b Rand-Degree/rand-ER relative to the original networks. 



interventional effect of enzymes in the metabolic system and detect 
the relevant modules. 

We cluster the network into modules using the AP algorithm 
augmented by our DS index to identify the pathways that correspond 
to metabolic functions. In total, 63 modules are detected by our 
method. We measure the biological quality of the clustering result 
by means of Gene Ontology (GO) enrichment 34 and use the tool GO 
TermFinder 35 to compute the functional enrichment p -values of 
components with respect to their biological process annotations. In 
the results, 28 modules are annotated by GO terms with p-values 
< 0.01 (the most significant p-value = 3.45E-16), which means 
that these modules represent significant biological functions in the 



metabolic system. In figure 6, certain representative modules are 
depicted with their corresponding GO terms and p-values. We note 
that not only dense subnetworks but also functional modules, with 
distinctive circle and path structures, are detected. These experi- 
ments provide compelling evidence that the DS is a meaningful 
and practical indicator in accurately depicting the structures of direc- 
ted networks. 

Discussion 

The domination capability of a vertex reflects the vertex's ability to 
interfere in dynamical control processes in many directed complex 
systems. The key task is to explain the manner in which a vertex 




Figure 5 | Module-detection results in the directed glossary network. The modules are labelled with colours. 
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Figure 6 | Module-detection results in the Homo sapiens network. Certain representative modules are marked with colours and the corresponding GO 
terms and p-values are given. 



intervenes in its downstream and upstream spaces in the imple- 
mentation of dynamical functions. In this paper, based on the deter- 
mination of controllable and observable subspaces under the 
minimum-cost condition, we introduced the DC index to assess 
the capabilities of vertices in directed networks. The results of our 
statistical studies demonstrate that the domination centrality is, to a 
great extent, encoded by the degree distribution of the underlying 
network, yet there is no discernible correlation between the degree 
and DC of a vertex. This result provides guidelines for the selection of 
the most effective means through which we can intervene in a sys- 
tem's dynamics. Furthermore, to analyse the cooperative relations 
among vertices in the domination of an entire network, we defined 
the domination similarity, and we were able to detect significant 
functional modules in glossary and metabolic networks through 
clustering. As direction-specific and parameter-free indexes, DC 
and DS are effective and practical in accurately depicting the struc- 
tures of directed networks. In our future studies, we intend to invest- 
igate the most effective approach to intervening in the dynamical 
functions of complex systems through selected vertices. 

Methods 

Enumerating all possible different maximum matching link sets Ms is infeasible when 
calculating DS, as in the worst case scenario, there may be an exponential number of 
them. However, we note that there are many "redundant" links in real networks that 
may never appear in any maximum matching. Based on their role in the Ms, links can 
be classified into three categories: "critical" links must appear in all Ms, "redundant" 
links may never appear in any one of them and "ordinary" links play roles in some, 
but not all, Ms 10 . In combination with the sparseness of real networks, we can 
approximate the control subspaces and observation subspaces of vertices via an 



optimisation routine. As shown in figure 7, we observe the beneficial phenomenon 
that in a real network, there always exists some consistent set of control subspaces and 
observation subspaces of vertices induced by different maximum matchings. This 
observation supports the feasibility of using a small number of maximum matchings 
to approximate the complete control and observation subspaces of vertices. A random 
optimisation can be performed rather quickly using a Markov sampling process. 

Algorithm for DS. 
Input network G(A) 

0=l,T=l,f=O 

do 

Markov random sampling to produce a maximum matching M 

CS'(M) = {j\j is accessible from vertice i 
inCF(V,(MUAL))} 

OS 1 (M r ) = {j\j is accessible from vertice i 
inOF(V,(M r UAL'))} 

cs { = cs { u CS ! ' (M) , OS 1 = OS 1 u OS 1 (M T ) 



if |0 - t\/t <st=t+l else t = 0 
x = G 

While t < \J/ 

Calculate Domination Similarity 
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Figure 7 | The influence of the number of random samples among all 
maximum matchings on the domination- similarity result. The growth 
rates of the sum of the complete control and observation subspaces as 
functions of the number of random samples of maximum matchings in the 
GlossTG, Homo sapiens and SmaGri networks are shown. 

In every loop, we randomly produce a maximum matching M and update the 
complete control subspace CS and complete observation subspace OS of each vertex 
by merging the additional accessible vertices introduced in this M. CS s and OS s are 
added but never deleted throughout the entire procedure. 6 is the sum of all CS s and 
OS s. If the rate of increase of 6 is less than s for \jj continuous loops, the random 
optimisation procedure terminates, and we then calculate the Jaccard similarity 
coefficients of the CS s and OS s of two arbitrary vertices. The growth rates of 6 during 
the random optimisation procedures for the GlossTG, Homo sapiens and SmaGri 
networks are presented in figure 7. Clear improvement in 9 is achieved for 90, 7845 
and 4527 maximum matchings in 247, 27995 and 15172 random samples for 
GlossTG, Homo sapiens and SmaGri, respectively. We note that the growth rate of 6 
rapidly decreases to nearly 0 as the sampling number increases. This observation 
supports the appropriateness of using only a certain number of Ms to approximate the 
domination similarities of vertices in real networks. We set s = 0.000001 and \jj = 50 
in the clustering- analysis case studies. 

A Markov process, as described by Jia et al. 3637 , performs unbiased random sam- 
pling among all maximum matchings and can be used to estimate the role of each 
vertex in controlling the network. This algorithm randomly chooses a vertex in a 
given M, enumerates all alternative maximum matchings that include all other ele- 
ments except this vertex by removing all its links, then randomly chooses one of these 
alternative maximum matchings as the current M and repeats the process. 

However, removing a vertex in a random sampling may not be effective for cal- 
culating DS. Usually, we identify a maximum matching in a bipartite graph and 
attempt to increase the matching size via an augmenting path that begins at a matched 
vertex, ends at an unmatched vertex and alternates between unmatched and matched 
links on the path. For example, in figure 8, a bipartite graph that is separated into the 






(a) network 



(b) M/ 



(c) M 2 



Figure 8 | A schematic diagram illustrating the process of random 
sampling, (a): The original network, (b): A bipartite graph separated into 
the out and in sets; the red link set 2^2,3^4,5} forms a maximum 
matching, M h and the blue path is an augmenting path when the matched 
link Z 1>2 is removed, (c): A new maximum matching M 2 constructed by 
alternating the blue augmenting path. 

out and in sets is constructed in figure 8(b) for the network in figure 8(a) . The red links 
are matched, the black dotted links are unmatched, and the matched link set 
{h,2> h,3> U,5} forms a maximum matching M h Proceeding from this maximum 
matching, we randomly choose vertex 2 and leave the current matched vertices and 
links unchanged. Instead of removing all links of vertex 2, we delete only the matched 
link / 1;2 . Then, we can identify an augmenting path that begins at the relevant matched 
vertex I and ends at the presently unmatched vertex 2; in the figure, this path is 
indicated by a blue line. Finally, by alternating between unmatched and matched links 
on this blue path, we obtain a new maximum matching M 2 in which the matching of 
vertex 2 has been replaced, as shown in figure 8(c). By contrast, because it removes all 
links of vertex 2, the method of Jia et al. 37 cannot produce any new maximum 
matching from the given M 7 . Nevertheless, we use the Markov process defined by 
these authors to perform unbiased random sampling among all maximum matchings 
to estimate DS. The only difference is that we also enumerate all alternative maximum 
matchings that include all other elements except the matching link of the chosen 
vertex. 

In fact, the maximum matchings reveal the functions and the roles that the vertices 
and links play for controlling the whole network with minimum cost. Different 
combinations of ordinary links constitute different maximum matchings and pro- 
duce different choices of minimum-control-cost configurations. Ordinary links are 
alternatives for constructing the backbone of the propagation of control influence. 
Therefore, we consider each combination of ordinary links (COL) to be one state. The 
set of ordinary links of a network is {h,l 2 ,. . .,/ v }, where v is the number of ordinary 
links. The Markov chain can be characterised by a transition matrix P with the 
elements Py = T\. x ( 1 — Q;) x Q ; , where Q t is the probability of ordinary link / 2 - being 
included in an M. The transition from state i to state j requires the choice of a matched 
link from an M, with a probability of T\. = 1 / \M\ ; the choice of a COL set that excludes 
l iy with a probability of (1 — Q f ); and the choice of a COL set that includes lp with a 
probability of Q ; . Clearly, Py 7^ Pjj, our algorithm is not guaranteed to choose each set 
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Figure 9 | The distribution of the counts in each of the 164 COLs in the GlossTG network, (a): A matched link / ; is randomly selected from an Mwith a 
probability of 1/ 1 Ml . (b) The selection probability is adjusted based on the number of alternative COLs that are enumerated by our sampling procedure. 
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of matched links with equal probability. For example, in figure 9(a), for the real 
network GlossTG with 164 COLs, we perform 239,000 iterations of our sampling 
algorithm and count the number of times that each COL is picked. We find that a few 
COLs are sampled many times, but it is very difficult to ensure that all COLs are 
sampled at least once. Thus, we adjust the transition matrix P and construct a new 
transition matrix P' with the elements Pjj = T{. x (1 — Q,) X Q,. If we can set 
T(. = Tj. x (l — Qy) x Qi, then P f; - = Pjj, meaning that the transition matrix P is 
symmetric and the steady- state distribution possesses equal probabilities for all states. 
However, Q, cannot be determined effectively; in practice, u ] j S ^ Jk u k is used to 

approximate (1 — Q ; ), where uj is the average number of all alternative COLs that can 
be enumerated by removing lj in the first |M| 2 iterations of our sampling procedure. 
Intuitively, if many alternative COLs can be enumerated by removing lj, then the 
probability of choosing lj from an M should be increased. With this modification, the 
sampling procedure becomes more efficient, and the 164 COLs in GlossTG can be 
obtained within 20,000 iterations. As shown in figure 9(b), we perform our modified 
sampling algorithm 193,500 iterations and count the number of times that each COL 
is picked. The result demonstrates that this procedure provides a more even-handed 
random sampling among all maximum matchings. 
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